Support multiple publishing formats such as PDF ( Portable Document Format supported by Adobe Acrobat), SVG ( Scalable Vector Graphics for images), RSS ( Rich Site Summary for Web portals), and more. 支持例如PDF(AdobeAcrobat支持的“可移植文档格式”)、SVG(用于图像的“可伸缩向量图形”)、RSS(用于Web门户的“丰富站点摘要”)等多种发布格式。
In the most categorization algorithms, the text or document is always represented using Vector Space Model. 纲后长数文本开类方式都非以背量空间模型为基本的。
The order in which the terms appear in the document is lost in the vector space representation. 词项在文档中的顺序会在向量空间的表示中忽略。
To improve document clustering, a document similarity measure based on cosine vector and keywords frequency in documents is proposed, but also with an input ontology. 为了改进文本聚类的效果,提出了将领域知识本体和文本关键词词频相结合的基于余弦向量的文本相似性测度方法。
Parallel text document to vector conversion using LLR based ngram generation 结合基于LLR的ngram生成算法并行处理文本文档到向量的转换
We set up document feature vectors and user interest vector, and do their correlation calculation by using vector space model, finally return the interested results for the user. 分别构建文档特征向量和用户兴趣向量,运用向量空间模型对其做相关性计算,返回用户感兴趣的检索结果。
User Need Expression Analysis Under Document Vector Based Environment 文献向量空间环境下用户需求表达方式分析
But the first generation of search engines, such as AltaVista, provide full text index. Its ranking strategy is based on the cosine similarity between a query vector and a document vector. 搜索引擎的问世,解决了信息的定位问题,但是第一代搜索引擎例如AltaVista提供的是全文索引,排名只依据查询向量与文档向量的余弦相似度。
The third step, calculate out relevant degree the document vector quantity of every result obtained through the vectorial model algorithm and user's model vector quantity, and the way to arrange in the order of relevant degree, then save the vector quantity of the document. 第三步,把获得的每一结果项的文档向量,通过向量模型算法和用户模型向量计算出相关度,并以相关度排序的方式,对文档向量进行保存。
The Research on Storing XML Document into Relational Storage Based on Segmented Bit Vector Coding Schemes 基于分段位向量编码的XML文档到关系存储的研究
Classify Japanese Document by Support Vector Machine 基于SVM的日文网页分类
Since the traditional algorithms based on frequency and threshold filtering may often lead to the loss of effective information, this paper presents a new system for TC, which introduces rough set theory that can greatly reduce the document vector dimensions by reduction algorithm. 而传统的基于频率的阈值过滤法往往会导致有效信息的丢失,影响分类的准确度。该文将Rough集理论引入自动分类,并提出了一种新的文档向量约简算法。
According to the text term distribution and content representing ability of different fields of HTML document, an improved Vector Space Model is proposed in this paper. 根据HTML文档不同标签域的分布特征和对文档内容的代表能力不同,本文提出了一种改进的向量模型;
Document Similarity Degree Measuring Based on Compressed Sparse Matrix Vector Multiplication Technique 基于压缩稀疏矩阵矢量相乘的文本相似度计算
A Research of Document Clustering Algorithm Based on Vector Space Model 基于向量空间模型的文档聚类算法研究
In network information retrieval, based on document vector space, class, cluster, ranking and relevance feedback need to compute similarity. 在网络信息检索中,基于文档向量空间的分类、聚类、排序与相关性反馈需要计算相似度。
This paper comprises the discussion of intelligent agent, natural language processing, document representation, document classification, support vector machine, and the detailed design and implementation of Web News Hunter Intelligent Agent System. 本文包括对智能代理、自然语言处理、文本表示、网络搜索、文本分类和支持向量机等网络挖掘相关领域的理论、算法和应用的探讨,以及WebNewsHunter智能代理的系统框架的设计与实现。
Improved Algorithm of Web Document Representation Based on Vector Space Model 基于向量空间模型的网页文本表示改进算法
It constructs document feature vector of subject and Keyword separately by using a new method of document feature extraction. 使用新的文档特征抽取方法构造了文档的主题和关键字特征向量。
Firstly, after introducing the theory of the special mail document, this paper analyzes the mail format and proposes a VSM ( Vector Space Model) for the mail document. 首先介绍了邮件这一特殊文档的相关理论背景,分析了邮件格式和邮件文档的向量空间模型。
Then, research on the technology of the page document said, including: text vector space model, the choice and will feature text page for vector space model of structured format. 然后,对文档的表示技术进行了研究,包括文本的向量空间模型表示、特征项的选择,将文本表示为结构化的向量空间模型。
And in the document representation module of this model, it uses the vector space model to preprocess the shared document resources by nodes, so that it can form the feature of documents. 其中模型通过文档表示模块利用向量空间模型对节点所共享的文档资源进行预处理,以形成文档的特征。
This is the official document writing vector characteristics of static. 这是对公文写作载体特点的静态概括。
Based on the specific situation ( success, failure) of Query Disambiguation, two distinct types of Document Semantic Relevance Measure, namely Semantic Vector Space Model based Document Relevance and Word Vector Space Model based Document Relevance, are proposed in this dissertation. 本文根据查询消歧的具体情况(成功、失败),提出两种文档语义相关性度量的方法:基于语义向量空间模型的文档相关性和基于词汇向量空间模型的文档相关性。
Firstly, sample and topic set is analyzed, event arguments are extracted, and document vector space model is constructed. 该方法首先对待检测样本及话题集合进行分析,对其中的事件元素及其描述信息进行抽取,并构造文本向量空间模型。
The user interest model is constituted by a recent search word vector, a history search keywords vector, a document center vector and a query catalog vector. ·提出了综合最近检索词向量,历史检索关键词向量,文档中心向量和类别特征树的用户兴趣模型。
First to pretreated the information, including text segmentation, feature extraction, establish document feature vector. 本文针对采集的信息要先做预处理,包括文本分词,特征选取,建立文档特征向量等。
So term weight multiplied by the weight we got in topic keyword extraction is used to revise the document vector before similarity calculation. 基于以上分析,本文又提出了利用主题关键词抽取过程中的权重乘以其在文档向量中的权重,对文本向量进行修改,然后进行相似度计算。
Text categorization is usually based on the vector space model, in which the document is represented by feature vector. 基于统计的文本分类,通常采用向量空间模型,将文档表示为特征向量。
The present Web mining technology, especially the core algorithmic of Web document classification and clustering are based on statistical word frequency Vector Space Model ( VSM). 目前Web挖掘技术中,特别是Web文本的分类、聚类,采用的核心算法是基于词频统计的矢量空间模型算法。